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ABSTRACT 

We present a measurement of the probability distribution function (PDF) of the trans- 
mitted flux in the Lya forest from a sample of 3492 quasars included in the SDSS DR3 
data release. Our intention is to investigate the sensitivity of the Lya flux PDF as 
measured from low resolution and low signal-to-noise data to a number of system- 
atic errors such as uncertainties in the mean flux, continuum and noise estimate. The 
quasar continuum is described by the superposition of a power law and emission lines. 
We perform a power law continuum fitting on a spectrum- by-spectrum basis, and ob- 
tain an average continuum slope of a v — 0.59±0.36 in the redshift range 2.5 < z < 3.5. 
Taking into account the variation in the continuum indices increases the mean flux 
by 3 and 7 per cent at z = 3 and 2.4, respectively, as compared to the values inferred 
with a single (mean) continuum slope. We compare our measurements to the PDF 
obtained with mock lognormal spectra, whose statistical properties have been con- 
strained to match the observed Lya flux PDF and power spectrum of high resolution 
data. Using our power law continuum fitting and the SDSS pipeline noise estimate 
yields a poor agreement between the observed and mock PDFs. Allowing for a break 
in the continuum slope and, more importantly, for residual scatter in the continuum 
level substantially improves the agreement. A decrease of ^10-15 per cent in the mean 
quasar continuum with a typical rms variance at the 20 per cent level can account for 
the data, provided that the noise excess correction is no larger than <10 per cent. 

Key words: cosmology: theory - gravitation - dark matter -baryons- intergalactic 
medium 



1 INTRODUCTION 

The Lya forest seen in quasar spectra probes the inter- 
galactic medium (IGM) and the underlying matter distri- 
bution over a wide range of scales (k ~ 0.1 — 10 /iMpc -1 ) 
and redshifts (1 <Cz <^6). Measurements of the mean flux 
in the Lya forest shed light on the reionization history and 
the physical state of the IGM in the post-reionization area 
(Press, & Rybicki & Schneider 1993, hereafter P93; Rauch 
et al. 1997; Bernardi et al. 2003, hereafter B03; Bolton & 
Haehnelt 2006). Fluctuations in the Lya flux are of great 
interest since they provide information on the matter dis- 
tribution on scales smaller than those accessible to other 
observables (e.g. Croft et al. 1998; 1999; 2002b; Nusser & 
Haehnelt 1999; 2000; Pichon et al. 2002; Zaldarriaga, Hui 
& Tegmark 2003; McDonald et al. 2000; McDonald et al. 
2005a). Combined with CMB observations, the power spec- 



trum of the Lya forest can provide stringent constraints on 
the shape and amplitude of the primordial power spectrum 
(Seljak et al. 2004; Viel & Haehnelt 2005). 

The probability distribution function (PDF) of the Lya 
transmitted flux was first studied by Jenkins & Ostriker 
(1991). It has, however, received less attention in the past 
years as it is more sensitive to systematics errors such as 
continuum fitting uncertainties. Yet, the tension between 
the WMAP and Lya values of the normalisation amplitude 
as argues in favour of incorporating statistics others than 
the Lya flux power spectrum. Rauch et al. 1997 and Mc- 
Donald et al. (2000) have computed the PDF of the Lya 
transmitted flux from high resolution data and found that 
ACDM cosmologies provide a good fit to the observations. 
Gaztahaga & Croft (1999) have provided an analytic de- 
scription of the Lya flux PDF based on perturbation the- 
ory. Choudhury, Srianand & Padmanabhan (2001), and Des- 
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jacques & Nusser (2005) have investigated the constraints 
obtained from a joint fit of the PDF and power spectrum of 
the Lya transmitted flux. Becker, Rauch & Sargent (2006) 
have examined the redshift evolution of the flux Lya PDF 
from a large sample of Keck HIRES data. Note also that 
Lidz et al. (2005) have advocated working with the PDF of 
the fluctuations in the flux about the mean as it is insensitive 
to the quasar continuum. 

The Sloan Digital Sky Survey (SDSS; York et al. 2000) 
has greatly increased the statistical power of the Lya forest. 
Unfortunately, attempts to exploit the data are plagued by a 
number of poorly constrained parameters that describe the 
physical state of the IGM, by inaccuracies in the numerical 
modelling of the Lya forest, and by systematics errors in the 
measurement such as continuum fitting uncertainties. The 
observed flux in the Lya region of a quasar (QSO) spectrum 
depends both on the quasar continuum (the flux emitted by 
the quasar, including the emission lines) and on the amount 
of absorption by intervening galactic matter (lines and con- 
tinuum absorptions). The transmitted (or normalized) flux 
obtained by continuum fitting of the observed spectrum is 
related to the optical depth along the line of sight as 

F(w) = I ohs (w)/I cont = e- T(tu) , (1) 

where r is the optical depth, w is the redshift space coordi- 
nate along the line of sight, Jobs is the observed flux and 7 C ont 
is the flux emitted from the source (quasar) that would be 
observed in the absence of any intervening material. To es- 
timate the unabsorbed continuum, two different approaches 
have been used. When the signal-to-noise is large, a polyno- 
mial continuum is fitted to regions of the Lya forest free of 
absorption (e.g. Rauch et al. 1997; McDonald et al. 2000). 
When the signal-to-noise is low, one usually extrapolate the 
continuum redward of the Lya emission line, assuming a 
power law shape. The last method only provides an approx- 
imation to the true continuum, and can easily introduce sev- 
eral types of systematic errors (e.g. Steidel & Sargent 1987). 
The strong degeneracy between the effective optical depth 
T e ff = — hi(.F) and the continuum level affects the determi- 
nation of the clustering amplitude ug. Measurements of the 
effective optical depth from low resolution quasar spectra 
with comparable signal-to-noise (P93; B03) are systemati- 
cally higher than those based on high-resolution spectra by 
10-20 per cent. As argued by Seljak et al. (2003), Tytler 
et al. (2004), Viel et al. (2004b), this is most likely due to 
systematic errors in the continuum fitting procedure of the 
low resolution spectra. Furthermore, the low signal-to-noise 
of SDSS spectra in the Lya region (S/N ~ 3 typically) re- 
quires an accurate noise estimate. Systematic errors in the 
noise characterisation will affect the accuracy of the mea- 
surements. In this respect, several lines of recent evidence 
suggest that the SDSS reduction pipeline underestimates the 
true noise by 5-10 per cent (McDonald et al. 2006; Burgess 
2004). 

In this paper, we measure the PDF of the Lya trans- 
mitted flux from a large (public) sample of quasars included 
in the SDSS DR3 data release (Abazajian et al. 2005). We 
assume that the quasar continuum follows the parametric 
form given in B03. However, unlike B03, we also allow for 
the variation in the continuum indices of individual spectra 



and fit a power law continuum for each spectrum separately. 
We compare our measurements to the probability distribu- 
tion obtained from lognormal, realistic looking SDSS spec- 
tra. The statistical properties of these mock spectra are be- 
forehand constrained to match the observed Lya flux prob- 
ability distribution and power spectrum of high resolution 
data of the forest. Our intention is to investigate the sensi- 
tivity of the Lya flux PDF as measured from low resolution, 
low signal-to-noise data to a number of systematic errors 
such as uncertainties in the mean flux, continuum and noise 
estimate. 

The paper is organised as follows. We briefly review the 
lognormal model of the Lya forest in Section §2. The con- 
straints on the model parameters are discussed in §3. The 
continuum fitting procedure and the measurement of the 
PDF of the Lya flux are presented in §4. In §5, we compare 
the simulated and observed PDF, and study the effect of a 
number of systematic errors. We discuss our results in §6. 
In §7, we conclude and indicate potential future works. We 
will present results for a ACDM cosmology with normali- 
sation amplitude erg = 0.83, and spectral index n s = 0.96. 
This is consistent with the constraints obtained from the 
latest CMB and Lya forest data (Spergel et al. 2006; Viel, 
Haehnelt & Lewis 2006; Seljak, Slosar & McDonald 2006). 



2 GENERATING MOCK QUASAR SPECTRA 

We implement the lognormal model introduced by Bi and 
collaborators (Bi, Borner & Chu 1992; Bi 1993; Bi & David- 
sen 1997; see also Choudhury, Padmanabhan & Srianand 
2001; Viel et al. 2002) to simulate the distribution of low- 
column density Lya absorption lines along the LOS to 
quasars. The main advantage of this procedure is that simu- 
lated spectra can have an arbitrary large length. This allows 
us to eliminate periodicity effects that are present in sim- 
ulations, where the typical box size is noticeably smaller 
than the total length of a single spectrum. Furthermore, 
this approach is computationally very efficient as compared 
to N-body simulations. 



2.1 The lognormal model of the Lya forest 

The lognormal model of the IGM is based on the assump- 
tion that the low-column density Lya forest is produced by 
mildly nonlinear fluctuations (8p/p <^10) which smoothly 
trace the dark matter distribution. The IGM density con- 
trast 5b is obtained from a local mapping of the linear IGM 
density contrast <5l (Coles & Jones 1991), but the IGM pe- 
culiar velocity along the line of sight is assumed to be linear 
even on scales where the density contrast gets non-linear (Bi 
& Davidsen 1997) . We have namely 

6 b (x,z) = exp (6l(x,z) - a£(z)/2) - 1 

v b (x,z) = v L (x,z) , (2) 

1 This assumption is motivated by the continuity equation which 
reads V • v oc — dln(l + <5)/dt if one neglects the coupling <5v. 
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where (Tl,(z) is the rms fluctuations of the linear IGM den- 
sity field at redshift z. The linear IGM density and peculiar 
velocity, 5l and vl, are obtained by smoothing the linear 
matter density and velocity distribution on some character- 
istic scale x-p = 1/&f to mimic pressure smoothing. The 
linear IGM clustering amplitude, <tl, is thus 



at = D{zf 



f 

Jo 



d\nkAl(k)w£(k,k F ) 



(3) 



where D(z) is the linear density growth factor, Al(k) is the 
dimensionless, linear matter power spectrum at the present 
epoch (Peebles 1980) and W h (k,k F ) is the IGM filter. We 
take the fc-space filter Wb to be a Gaussian. Such a filter 
gives a good fit to the gas fluctuations over a wide range 
of wavenumber (Gnedin et al. 2003; see also Zaroubi et al. 
2005). In principle, we expect k F to depend on the physical 
state of the IGM. However, since the relation between k F 
and T g depends noticeably on the reionization history of 
the Universe (Gnedin & Hui 1998; Nusser 2000), it is more 
convenient to treat A;f as a free parameter. 

We calculate the Lya transmitted flux F = exp(— t) 
in the fluctuating Gunn-Peterson approximation (Gunn & 
Peterson 1965, Bahcall & Salpeter 1965). The optical depth 
for Lya resonant scattering at some redshift space position 
to is expressed as a convolution of the real space Hi density 
along the line of sight with a Voigt profile H, 



8h(k) and Vh(k) at discrete fc-space positions. The two fields 
are correlated Gaussian random fields that can be written as 
linear combination of two independent fields u(k) and w(k) 
having (dimensionless) power spectra A^(fc) and A^(fc), 



S h (k,z) = D(z) (u(k) + w(k)) 

v L (k,z) = iqE(z) — f(k)w{k), 
c 



(6) 



where E{z) is the linear growth factor of the velocity field. 
A^(fc), A^(fc) and f(k) are constrained by the auto- and 
cross-correlations of the density and velocity fields, 
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where A| D and Af D are the 3D and ID power spectra, re- 
spectively. We obtain the linear IGM density and peculiar 
velocity field in r-space from a fast Fourier transform (FFT) , 
and calculate the mean transmitted flux by combining equa- 
tions (2), (4) and (5). For a given cosmological model, the 
flux distribution depends only on the filtering wavenumber 
/cf, the mean flux (F), the adiabatic index 7 and the mean 
IGM temperature T4. 



t(w) 



ccto 
H{z) 



dx n m (x)H [w - 



v b (x),b(x)] . (4) 



where 00 = 4.45 x 10~ 18 cm~ 2 is the effective cross-section 
for resonant line scattering, H(z) is the Hubble constant at 
redshift z, x is the real space coordinate, n H1 (x) is the neu- 
tral hydrogen density, b(x) is the Doppler parameter due to 
thermal/turbulent broadening and H(x) is the Voigt pro- 
file, which can be approximated by a Gaussian for moderate 
optical depths. The Hi hydrogen density, n m (x), and the 
Doppler parameter, b(x), are computed using a tight poly- 
tropic relation (Katz et al. 1996, Hui & Gnedin 1997, Theuns 
et al. 1998) that produces results comparable to full hydro- 
dynamical simulations, 

n m (x) = n^l + M 2 - -^- 1 ' 

b(x) = b(l + S h f'- 1)/2 , (5) 

where the adiabatic index 7 is in the range 1 — 1.6. n HI 
and b are the Hi hydrogen density and Doppler parameter 
at mean gas density respectively. The latter is a function of 
the IGM temperature, 6 = 13 kms^'f^ 2 , where T4 is the 
IGM temperature at mean density (in unit of 10 4 K) . Note 
that, since n HI is usually constrained by fixing the mean 
flux level, (F), we will treat (F) as a free parameter in the 
remaining of this paper. 



2.2 The simulation method 

We simulate the Lya transmitted flux in a periodic line of 
sight of comoving length L at N discrete r-space position 
Xi, i = !,••• ,N. Following Bi (1993), we generate two fields 



2.3 Properties of the synthetic spectra 

To create realistic mock spectra, we compute the flux dis- 
tribution on a one- dimensional (ID) grid whose resolution 
A is fine enough to resolve the smallest structure on the 
filtering scale, and whose length L is long enough to in- 
corporate most of the fluctuation power. We typically have 
A <C0.002 /i _1 Mpc and L ^1000 /i _1 Mpc (comoving). We 
constrain the mean flux level (F) from a large sample of ide- 
alised, noise-free spectra. Instrumental resolution, noise and 
strong absorption systems are included as described below. 
Note that, in low resolution spectra, (F) will generally differ 
from the "effective" mean flux F of the processed spectra, 
which include a number of strong absorption systems. 



2.3.1 Instrumental noise and resolution 

We attempt to include instrumental noise and resolution in 
our synthetic spectra in a way that mimics the observations 
as much as possible (e.g. Rauch et al. 1997). We smooth the 
spectra with a Gaussian of constant width, and re-sample 
them on pixels of wavelength size AA, interpolating between 
adjacent values in the fine grid. The spectra are not modified 
to include continuum fitting uncertainties as the latter are 
taken out from the observed spectra. 

In the high resolution spectra, the transmitted flux is 
convolved with a Gaussian of full width at half maximum 
(FWHM) 6.6 kms -1 , and re-sampled on pixels of wave- 
length size AA = 0.04A In units of kms" , the resolution 
varies from 2 kms -1 at 2 = 3.9 to 2.9 kms -1 at z = 2.4. 
Further, Gaussian noise is added to each pixel with an am- 
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Figure 1. The effective optical depth T cf j = — ln{F) as measured 
from high-resolution spectra. The empty triangles, squares and 
circles show the measurements of Rauch et al. (1997), McDonald 
et al. (2000) and Kim et al. (2004). The filled circle is the mea- 
surement of Tytler et al. (2004). The dotted and dashed curves 
show the evolution reported by Kim et al. (2002) and Schaye et 
al. (2003). The vertical extent of the shaded regions indicate the 
values considered in our analysis. 



2.3.2 Strong absorption systems 

High column density systems associated with collapsed ob- 
jects such as disk galaxies are not reproduced in the log- 
normal model (Bi & Davidsen 1997). Large column den- 
sities are needed (N Hl J>10 17 cm -2 ) to produce the strong 
damping wings of the observed absorption line profile. In the 
high-resolution data, damped Lya systems (DLAs) are re- 
moved by eliminating wavelength intervals containing these 
absorption lines from the spectra. These strong absorption 
lines are, however, present in our sample of SDSS spectra. 
We, therefore, include a set of DLAs and Lyman limit sys- 
tems (LSS) in the low resolution mock spectra. We follow 
the procedure outlined in McDonald et al. (2005b), which 
is based on the self-shielding model of Zheng & Miralda- 
Escude (2002). The column density distribution of these 
strong absorbers is then normalised to reproduce the obser- 
vations of Peroux et al. 2003 and Prochaska, Herbert-Fort 
& Wolfe (2005). As the Doppler parameter distribution of 
the strong absorption systems is largely unknown, we use a 
single value of 30 kms" 1 at all redshifts. 



CONSTRAINING THE MODEL 
PARAMETERS FROM HIGH RESOLUTION 
DATA 



plitude given by a flux-dependent rms noise per pixel, n(F), 
as measured in MOO (see their Table 3). 

In the low resolution spectra, we adopt a FWHM of 
170 kms -1 (i.e. a dispersion of ~ 70 kms -1 ), and re-sample 
the flux on a uniform grid with Alog 10 A = 10 -4 . The low 
signal-to-noise of the SDSS spectra in the Lya region re- 
quires an accurate description of the noise. The true error 
on each pixel in each quasar spectrum is essentially made up 
of the Poisson noise from photon counts, the CCD read-noise 
and systematics errors from sky subtraction. McDonald et 
al. (2006) have outlined a procedure which follows closely 
the spectroscopic data reduction pipeline. The inconvenience 
of this method is that it requires a sky-flux estimate. Here, 
we simply assume that the noise distribution is Gaussian 
whose rms variance is given by the SDSS data reduction 
pipeline. Although the latter computes a variance for each 
flux pixel, there is some evidence that these error estimates 
do not perfectly reflect the true errors in the data (Bolton et 
al. 2004; McDonald et al. 2006; Burgess 2004). McDonald et 
al. (2006) and Burgess (2004) have recalibrated the noise of 
their spectra by differencing multiple exposures, and found 
that the rms noise variance given by the SDSS pipeline is 
underestimated by resp. 8 and 5 per cent on average. We 
will hereafter refer to a p as the fiducial (pipeline) noise esti- 
mate. The sensitivity of the mock spectra to the noise level 
will be discussed in detail in §5.2. Given the relatively broad 
distribution of pixel noise variances at fixed flux F, we use 
the full set of rms noise variances from the data points with 
< F < 1. We then map with repetition the individual noise 
estimates onto the pixels < F < 1 in the idealized mock 
spectra (e.g. Burgess 2004). Our mapping accounts for the 
fact that the mean pixel noise increases with redshift (see 
Figure 3). 



In this Section, we constrain the model parameters from a 
comparison between the flux power spectrum (PS) and prob- 
ability distribution (PDF) of the transmitted flux of mock 
and observed high resolution spectra. Desjacques & Nusser 
(2005) have pointed out that models that match best the PS 
alone do not necessarily yield a good fit to the PDF. It is 
therefore important to combine the PS and PDF statistics to 
ensure that both are correctly reproduced in the lognormal 
spectra. We perform a \ 2 statistical test for the observed 
flux power spectrum and PDF to determine quantitatively 
the values of the parameter required to fit high resolution 
measurements of the Lya forest. 



3.1 The high resolution data 

We use the measurements of McDonald et al. (2000), which 
were obtained from a sample of eight high resolution QSO 
spectra. Results are provided for three redshift bins centered 
at z = 2.41, 3.00 and 3.89. Regarding the flux power spec- 
trum, we consider the data points in the range 0.005 < k < 
0.05 skm -1 . The lower limit k — 0.005 skm -1 is chosen so as 
to avoid continuum fitting errors (Hui et al. 2001), and the 
upper limit k = 0.05 skm' 1 is chosen to avoid metal con- 
tamination on smaller scales (Kim et al. 2004). The observed 
flux PDF is very sensitive to continuum fitting, especially in 
the high transmissivity tail (e.g. Meiksin, Bryan & Machacek 
2001). The modelling of these errors is complicated by the 
fact that the scales of interest are of the order of the box 
size L of the simulations. However, MOO demonstrate that, 
if the inclusion of continuum fitting errors can account for 
most of the discrepancy between the simulated and observed 
PDF in the range F J>0.8, it should not greatly affect the 
PDF for F <J0.8. We, therefore, exclude the data points with 
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F > 0.8 from the analysis to avoid dealing with those errors. 
The MOO measurements are shown in Figure 2 as the filled 
symbols. The shaded areas indicate the data points used in 
this analysis is (10+15=25 measurements from the PS and 
PDF respectively). 



3.2 The parameter grid 

For each value of the parameter vector p=(A:f,(F'},7,T'4), we 
generate mock catalogues of 1000 lines of sight. We let the 
filtering wavenumber, the IGM adiabatic index and temper- 
ature assume the following values, 

k F = 5.55,6.25,7.14,8.33,10,12.5,16.67,25,50 
7 = 1,1.2,1.4,1.6 
T 4 = 1,1.5,2,2.5, 

irrespective of the redshift. The values of k F (in unit of 
/iMpc -1 ) are chosen such that l/k F uniformly spans the 
range 0.02 - 0.18 /i _1 Mpc. 

The assumed effective optical depth T e g or, equivalently, 
mean flux (F) = exp(— r B g) has a large impact on the sim- 
ulated one- and two-point statistics of the forest. Observa- 
tions indicate that T e g evolves strongly in the redshift range 
2 <Cz <;4. In Fig.l, we show several measurements of r e g 
obtained from high resolution observations. Empty trian- 
gles, squares and circles show the results of Rauch et al. 
(1997) and McDonald et al. (2000) for a comparable sam- 
ple of HIRES spectra, whereas the empty circles indicate 
the estimates of the LUQAS sample of Kim et al. (2004). 
The filled circle shows the measurements of Tytler et al. 
(2004). The dotted and dashed curves indicate the evolu- 
tion reported by Kim et al. (2002) and Schaye et al. (2003). 
Note that there is significant overlap among the quasar sam- 
ples of Schaye et al. (2003) and Kim et al. (2004). All these 
estimates have been obtained after removing damped/sub- 
damped Lya systems and pixels contaminated by associated 
metal absorption. The measurements of r e s are mostly af- 
fected by cosmic variance due to large variations between 
lines of sight, uncertainties in the continuum fitting proce- 
dure and the somewhat uncertain contribution from metal 
lines (P93; Zuo & Bond 1994; Rauch et al. 1997; Tytler et 
al. 2004; Viel et al. 2004a). In particular, the continuum fit- 
ting generally adopted for high resolution data may result 
in an underestimation of the continuum and of T e g (Kim et 
al. 2001). Based on these measurements, we adopt the fol- 
lowing values for the mean flux (F) : 0.45 < (F) < 0.55 
(z = 2.4), 0.65 < (F) < 0.75 (z = 3) and 0.75 < (F) < 0.85 
(z — 3.9). These intervals are shown as shaded regions in 
Fig. 1. Although the effective optical depth appears to evolve 
smoothly with redshift, the behaviour of T e g {z) inferred from 
a large sample of SDSS quasars is found to deviate from a 
power law around z = 3.2 (B03), suggesting that Hen reion- 
izes in that redshift range (Theuns et al. 2002b; Schaye et 
al. 2000). 

We use a spectral grid of TV = 2 19 pixels which are 
evenly spaced in wavelength (AA = 0.005A). N and A are 
chosen so that the number of pixels in the "degraded" mock 
spectra is a constant power of 2 to facilitate the computation 
of Fourier transforms. 
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Figure 2. The flux power spectrum and PDF of the best-fitting 
models (Ax 2 = 0) at redshift z = 3.9 (top), 3.0 (middle) and 
2.4 (bottom). The mean flux is, respectively, (F) = 0.45, 0.5, 
0.55, (F) = 0.65, 0.7, 0.75, and (F) = 0.75, 0.8, 0.85. In each 
panel, the solid and short-dashed curves stand for the lowest and 
largest values of (F), respectively. The IGM temperature has a 
fixed value, T4 = 1.5. Only fcp and 7 are varied to obtain the best- 
fitting parameters. The shaded regions indicate the data points 
we use to compute the value of \ 2 - The best-fitting values of kp 
and 7, together with the chi-squared value are listed in Table 1. 

For each mock catalogue, we calculate the flux power 
spectrum and the flux PDF. We determine the goodness 
of fit of any model in the grid by computing a \ 2 statistic 
from the difference between the simulated PS and PDF and 
the observational data. In the calculation of the x 2 > we ne- 
glect the correlations between measurements of the flux PS. 
However, in the case of the flux PDF, we include the full 
covariance matrix since these measuremements are highly 
correlated (MOO). We take advantage of the smooth depen- 
dence of the flux PS and PDF on the parameter vector, and 
use cubic spline interpolation to find the best-fitting models. 



3.3 The best-fitting models 

Fig. 2 compares the best-fitting models to the MOO data at 
redshift z = 3.9 (top panels), 3.0 (middle panels) and 2.4 
(bottom panels). The parameter values of the models are 
listed in Table 1. The solid, long dashed and short dashed 
curves show, respectively, the models with lowest, interme- 
diate and largest value of (F) at a given redshift. The IGM 
temperature has a fixed value, T4 = 1.5. Only k F and 7 are 
varied to obtain the best-fitting parameters. The shaded re- 
gions indicate the data points we use to compute the value 
of \ 2 - Note that the ID grid resolves the best-fitting values 
of the filtering length with 10 cells at least. The best-fitting 
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Table 1. Parameter values of the models which best fit the PS 
and PDF inferred from high-resolution measurements of the Lya 
forest. The mean IGM temperature has a fixed value T4 = 1.5. 
Only kp and 7 are varied to obtain the best-fitting models. The 
filtering kp is in unit of ftMpc -1 . The last column gives the 
chi-squarcd for 23 degrees of freedom. Note that, since we spline 
interpolate over the parameters, the best fit values do not neces- 
sarily lie at a grid point. 



rcdshift 


(F) 


kp 


7 


X 2 


3.9 


0.45 


37.1 


1.00 


24.6 




0.50 


32.7 


1.00 


26.7 




0.55 


36.3 


1.12 


45.7 


3.0 


0.65 


18.3 


1.00 


39.5 




0.70 


20.5 


1.00 


22.6 




0.75 


32.8 


1.16 


40.7 


2.4 


0.75 


12.5 


1.00 


108.8 




0.80 


15.5 


1.00 


41.8 




0.85 


25.0 


1.00 


46.3 



models provide an acceptable fit to the data down to red- 
shift 2 — 3. At 2 = 3.9 and 3, the best chi-squared has an 
acceptable value of \ 2 <J25 for 23 degrees of freedom (25 
data points minus the filtering length and adiabatic index). 
At z = 2.4 however, % 2 Jj40, which should be exceeded 
randomly only ~1.5 per cent of the time. As expected, the 
lognormal approximation no longer provides a good fit (in a 
chi-squared sense at least) to the data for z < 3 (e.g. Nusser 
& Haehnelt 2000). For most of the models, the best-fitting 
value of the adiabatic index is 7 = 1, whereas observations 
indicate that 7 ~ 1.3 — 1.5 in the redshift range consid- 
ered here (Schaye et al. 2000b; McDonald & Miralda-Escude 
2001). Note, however, that there is a degeneracy between the 
filtering wavenumber kp and the adiabatic index 7 which al- 
lows one to match the data with larger values of 7 and kp 
(Desjacques & Nusser 2005; see also Meiksin & White 2001). 

The MOO results are averaged in relatively large redshift 
bins, 2.09 < z < 2.67, 2.67 < z < 3.39 and 3.39 < z < 4.43. 
The evolution of the mean flux (F), for example, is signifi- 
cant over those redshift intervals. We could account for the 
redshift evolution by averaging mock catalogues computed 
at different 2 in the same redshift bin before comparing with 
the observations. However, this correction is difficult to ap- 
ply as the exact dependence of kp, 7 and T4 on the redshift 
is unknown. Additional assumptions on the reionization his- 
tory of the Universe could reduce the freedom in the param- 
eter space. In this respect, the observed line-width distribu- 
tion suggests that, around z — 3, there is a sharp increase 
in T g together with a decrease in 7 (Schaye et al. 2000b; 
Ricotti, Gnedin & Shull 2000; McDonald & Miralda-Escude 
2001). However, the data are too noisy to provide robust 
constraints on T4 and 7. 



4 THE DATA 

4.1 The SDSS DR3 sample 

We use 3492 quasar spectra included in the Sloan Digital 
Sky Survey DR3 data release (Abazajian et al. 2005). York 
et al. (2000) provide a technical summary of the survey. 
The SDSS camera and the filter response curves are de- 
scribed in Gunn et al. (1998) and Fukugita et al. (1996), 
respectively. Lupton et al. (2001) and Hogg et al. (2001) 
discuss the SDSS photometric data and monitoring system. 
Richards et al. (2002) describe the algorithm for targeting 
quasar candidates from the multi-color imaging SDSS data. 
To avoid contamination from Ly/3 absorption and the prox- 
imity effect on the blue and red sides of the Lya forest, wc 
define the Lya forest as the rest-frame interval 1080 — 1160A 
(B03). The top panel of Fig. 3 shows the number of pixels in 
the DR3 sample which belong to the Lya forest as defined 
above. The gaps at 2 ~ 3.59 and z ~ 3.84 correspond to 
the Oi(5577A) skyline, and interstellar line Nai(5894.6A) re- 
spectively. Pixels in the wavelength range 5570 < A < 5590 
and 5885 < A < 5905A were removed from the analysis. 
The fainter quasars, mostly those at redshift 2 > 4, suffer 
from significant contamination from OH emission features. 
Although these OH sky-subtraction residuals could in prin- 
ciple be removed (e.g. Wild & Hewett 2005), we have not 
bothered to do so because they mostly affect pixels longward 
of 6700A. 

The distribution of the signal-to-noise ratios in the Lya 
forest as a function of the median redshift z-^ ya is plotted 
in the bottom panel of Fig. 3. The transmitted flux in the 
forest is lower at high-redshifts, so higher redshift spectra 
tend to have lower signal-to-noise ratios. The typical signal- 
to-noise ratio in the Lya forest is S/N ~ 4.6, 3.8 and 2.1 at 
redshift 2 = 2.4, 3.0 and 3.9, respectively. As a result, most 
of the Lya absorption lines are unresolved in the data. 



4.2 Estimating the continuum 

In high resolution and high signal-to-noise spectra, the shape 
of the continuum is determined separately for each QSO. A 
polynomial continuum is fitted to regions of the Lya forest 
which are free of absorption lines (as judged by eye) . In low 
resolution observations such as the SDSS-DR3 sample, an 
object-by-object estimate of the continuum is difficult. How- 
ever, the large size of the sample is suitable for a statistical 
approach. This allowed B03 to constrain simultaneously the 
mean quasar continuum and mean transmitted flux in the 
Lya forest. There is indeed a remarkable similarity between 
the spectra of the most distant quasars at 2 J>6 and their low 
redshift counterparts (Fan et al. 2003). At fixed luminosity, 
the spectral properties of quasars show little evolution with 
cosmic epoch (Vanden Berk et al. 2004 ). Hence, the mean 
QSO continuum can be thought of as being representative 
of the quasar population as a whole. 

The mean continuum is usually calibrated redward of 
the Lya emission line and then extrapolated blueward as- 
suming a smooth power law shape (P93) . Composite spectra 
suggest that the shape of the quasar continuum is the su- 
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Figure 3. Top : Distribution of the Lyct forest pixels in the data 
as a function of redshift. There are typically 350 pixels per spec- 
trum which lie in the Lya forest. The gaps at z = 3.59 and 3.84 
correspond to the 01 and Nal lines. Bottom : Average signal-to- 
noise ratios in the Lya forest as a function of its median redshift 
for the quasars in the DR3 sample. 

perposition of a single power law and emission lines. A prin- 
cipal component analysis (PCA) demonstrates that this is 
a reasonable assumption redward of the Lya emission line, 
where QSOs differ significantly in the normalisation, but lit- 
tle in the shape of the continuum (e.g. Yip et al. 2004). It is 
unclear, however, whether this parametrisation can be ex- 
tended to wavelengths blueward of the Lya emission line 
given the large impact of intervening absorption. At low 
redshift, where absorption in the Lya range is much less 
significant, composite spectra of IUE (International Ultra- 
violet Explorer) and HST (Hubble Space Telescope) quasars 
reveal that there is a significant steepening of the continuum 
slope towards wavelength shorter than ~ lOOOJl (Francis et 
al. 1991; O'Brien, Gonhalekar & Wilson 1992; Zheng et al. 
1997; Telfer et al. 2002). However, for a limited range in opti- 
cal and UV, the continuum can be approximated by a power 
law Zcont(f) v av . The distribution of indices may not be 
Gaussian, and may also depend on redshift (e.g. Telfer et 
al. 2002). However, measuring continuum indices without 
a very large range of wavelength, or some estimate of the 
strength of the contribution from blended emission lines, 
proves difficult (e.g. Vanden Berk et al. 2001). Indeed, Na- 
tali et al. (1998) have noted that the value of the continuum 
index is sensitive to the precise rest wavelength regions used 
for fitting. Therefore, the steep indices measured for high- 
redshift quasars (e.g. Sargent, Steidel & Boksenberg 1989; 
Schneider, Schmidt & Gunn 1991; Francis 1996; Fan et al. 
2001) may be due to the restricted wavelength range used 
in the fit, as suggest by Schneider et al. (2001), and not to 
a change in the continuum index with redshift. 

Following P93 and B03, we assume that the shape of 
the quasar continuum is the superposition of a single power 
law and emission lines. Our approach, however, differs from 
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Figure 4. The distribution of continuum indices as a function of 
redshift for quasars with z <J3.6. The histogram indicates the 
mean in bins of Az = 0.2. The averaged continuum index is 
—0.59 ±0.35. The horizontal dashed line is a u = —0.44, the mean 
slope reported by Vanden Berk et al. (2001) and B03. 

theirs in that we include variation in continuum indices. To 
proceed, we select wavelength windows free of emission lines 
and fit a power law continuum on a spectrum-by-spectrum 
basis. Although there are essentially no emission-line free 
regions (Vanden Berk et al. 2001), we use the rest wave- 
length intervals 1450-1470A and 1975-2000A (e.g. Telfer et 
al. 2002). A visual inspection of the fitted continuum has 
convinced us that this prescription provides a reasonable de- 
scription of the individual continua. Note, however, that low 
redshift SDSS composites indicate that the window 1975- 
2000A may be contaminated by Fell emission (see Fig. 6 of 
Vanden Berk et al. 2001). This would cause us to infer con- 
tinua softer than the actual ones (Telfer et al. 2002). The 
distribution of continuum indices is plotted in Fig. 4 for 
quasars with z <3.6. It is not possible to perform a similar 
measurement of the continuum at higher redshift because of 
the spectroscopic red limit of 9200A . The histogram indi- 
cates the mean value of a„ in bins of Az = 0.2. The average 
power law slope of the subsample is -0.59 (median -0.53) 
with a la dispersion of 0.36. This is in good agreement with 
the mean slope reported by Vanden Berk et al. (2001) and 
B03, a„ = —0.44, and with values found in optically se- 
lected samples (e.g. Francis et al. 1991; Natali et al. 1998). 
At z J>3.6, fitting a continuum proves difficult due to the 
relatively low signal-to-noise, and the short continuum base- 
line available redward of the op emission line. Using the rest 
wavelength range redward of the CIV emission line, ~1600- 
1700A then plays a significant role in determining the slope 
(Schneider et al. 2001). We have tried to fit the regions near 
1260A and 1650A as done in Schneider et al. (2001). As a 
consistency check, we have remeasured the continuum in- 
dices of quasars with z <;3.6 using that rest frame region. 
We have found that this method gives steeper indices than 
those inferred from the rest wavelength region 1460-2000A, 
in agreement with Vanden Berk et al. 2001. We have tried 
several other alternatives but none of them gave satisfac- 
tory results. We have therefore opted for a constant slope 
a v = —0.44 when the quasar redshift is z ;>3.6, implicitly 
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Figure 5. Two quasar spectra of the DR3 sample plotted as a 
function of wavelength in the rest frame. The dashed line shows 
the power law fit. The continuum index is ce\ = 0.10 and -1.56 
for the low and high redshift spectrum, respectively. The shaded 
area indicates the wavelength range 1450-1470A used to normalise 
the continuum. The solid curve shows the continuum in the Lya 
forest, including the emission lines. We analyse the Lya forest in 
the wavelength region 1080-1160A. 



z = 3.9 



1.5 - J /L< \ 




-1 -0.5 0.5 1 1.5 2 



F 

Figure 6. The probability distribution of the transmitted flux 
F is computed from the DR3 sample and plotted in differential 
form. Results are presented for three different redshift intervals 
of length Az = 0.2 centered at z = 3.9, 3.0 and 2.4. Vertical bars 
indicate the mean transmitted flux F in each redshift interval. 
The error bars are computed from a jackknife estimate 

redshift spectrum, a\ = 0.10, differs noticeably from the 
average slope of -1.56. 



assuming that the mean continuum index does not change 
much with redshift. Note that this will affect solely our mea- 
surement of the Lya flux PDF at z = 3.9. 

We follow P93, Zheng et al. (1997) and normalise the 
QSO continuum in the rest-wavelength range 1450 — 1470A 
to have the same flux as the observed spectra. To account 
for the emission lines blueward of 1216A, we adopt the 
parametrisation of B03, 



C 
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Arest 

"AT 



+C5 exp 
+cg exp 



+ C2 exp 



(Arest — Ce) 2 

2c| 

(Arest — C9) 2 



(Arest — C3) 
2cJ 



zc io 



(8) 



where C is the continuum as a function of the rest wave- 
length Arest- The continuum slope a a = — (2 + a„) is esti- 
mated spectrum-by-spectrum as explained above. The posi- 
tion of the peak of the Lya emission line (cg = 1215.67A), and 
the other two emission lines seen in the composite spectrum 
(c 3 =1073A and c 6 =1123A) are fixed to reduce the num- 
ber of free parameter. The remaining six parameters can be 
obtained, e.g., from a \ 2 minimisation of the difference be- 
tween the simulated and observed composite spectra (B03). 
Here, we have simply adopted the best fit values of B03 that 
were obtained for a constant power law slope a\ — —1.56. 
Two examples of spectra and continuum are shown in Fig. 5. 
The solid curve indicates our fit (8) to the continuum in the 
Lya forest region. Note that the continuum index of the low 



4.3 The probability distribution of the 
transmitted flux 

Once we have taken out the contribution of the continua 
from our quasar spectra (on a spectrum-by-spectrum basis) , 
we compute P(F), the probability distribution of the trans- 
mitted flux F. Since a substantial fraction of the pixels has 
a transmitted flux which lies outside the interval < F < 1, 
we use 60 bins of width AF — 0.05 with the first centered on 
F = — 1 and the last on F = 2. The first and last bins include 
the few additional points with F < — 1 and F > 2, respec- 
tively. Results are shown in the top panel of Fig. 6 for three 
different redshift intervals as defined in Table 2. These red- 
shift intervals are centered 0112 = 3.9, 3.0 and 2.4, allowing 
a direct comparison with the high resolution measurements 
of McDonald et al. (2000). It is important to note that each 
redshift interval covers a narrow redshift range of Az — 0.2. 
This is in order to reduce the impact of evolution with red- 
shift in each interval. The vertical bars mark the mean trans- 
mitted flux F in each redshift interval, obtained by averaging 
the individual flux pixels (without weighing them according 
to their noise). We have F ~ 0.39, 0.64 and 0.73 respectively. 
These values are significantly lower than those inferred from 
the high resolution sample, F ~ 0.45, 0.69 and 0.82 (MOO). 
The high noise level smoothes severely the PDF relative to 
that of high-resolution observations (compare with the right 
panels of Fig. 2), and is responsible for the existence of pix- 
els with F > 1 and F < 0. The effect is strongest at z = 3.9, 
where the average signal-to-noise in the Lya forest is lowest 
(see Figure 3). 

The errors bars attached on the measured PDF shown 
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Table 2. The redshift intervals considered in this paper. The last 
two columns show the number of spectra and pixels included in 
each interval. 



<2> 


-^min 


^max 


N of spectra 


N of pixels 


2.4 


2.3 


2.5 


942 


127212 


3.0 


2.9 


3.1 


1082 


133371 


3.9 


3.8 


4.0 


281 


29003 



in Fig. 6 are obtained from a jackknife estimate of the co- 
variance matrix dj, which includes Lya forest fluctuations 
and measurement noise. These diagonal elements are plotted 
as dashed curves in Fig. 7. Note that the curves at redshift 
z = 2.4 and 3 have been shifted vertically by 0.04 and 0.02 
respectively for clarity. As it is difficult to estimate cleanly 
off-diagonal terms or diagonal elements lying in the tails of 
the PDF with such an estimator, we have also computed 
the errors from the dispersion across many realisations of 
mock spectra with properties similar to that of the observed 
sample (cf. Section §2). In particular, the mock samples 
have exactly the same total wavelength coverage (number 
of pixels) as the actual sample. The parameters assume the 
best fit values inferred in §3, with a mean flux (F) — 0.5, 
0.7 and 0.8. Our estimates of y/Cu are shown in Fig. 7 as 
dashed and solid curves, respectively. They are consistent 
with each other at redshift z J>3. However, at lower redshift, 
the errors inferred from the observed sample are significantly 
larger than those obtained from the mocks. We have found 
that the jackknife estimator, when applied to a single mock 
SDSS sample, predicts errors similar to those inferred from 
a large number of mock realisations. Therefore, this discrep- 
ancy probably reflects errors in the measurement of the flux 
PDF, errors in the modelling of low resolution SDSS spec- 
tra, or/and the failure of the lognormal model to adequately 
describe the low redshift Lya forest (e.g. Nusser & Haehnelt 
1999). Note that the off-diagonal terms of the covariance ma- 
trix are negligible presumably because the large noise washes 
out correlations among the data points. The correlation co- 
efficient rij = dj I ' yJCuCjj is no larger than \nj\ <,0.01 for 
i ^ j. This is in contrast with high signal-to-noise measure- 
ments of the PDF (see e.g. MOO; Lidz et al. 2005) where 
off-diagonal terms in the covariance matrix are significant 
due to the strong correlation among data points. 

In Section §3, we have constrained the model parame- 
ters from measurements of the flux PS and PDF which are 
fairly representative of the true statistics at redshift z = 2.4, 
3 and 3.9. However, these measurements are averaged in rel- 
atively large redshift bins, over which the evolution of the 
Lya forest is significant. It is, therefore, prudent to exam- 
ine the extent to which our measurements of the PDF at 
redshift z = 2.4, 3 and 3.9 are sensitive to the adopted 
redshift intervals. We have thus computed the PDF of the 
transmitted flux for the redshift intervals adopted in MOO 
(2.09 < z < 2.67, 2.67 < z < 3.39 and 3.39 < z < 4.43). 
The results are shown as dashed curves in Fig. 8. They are 
compared to our fiducial measurement of the probability dis- 
tribution function obtained with a redshift interval Az — 0.2 
(solid curves). The difference is largest at z = 3.9, where the 
sharp decrease in the number of pixels at z ;>4 (Fig. 3) and 
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Figure 7. A comparison between the diagonal terms \J Cu of the 
covariance matrix computed from the data with a jackknife esti- 
mate (dashed curve) , and the \fCu obtained from a large number 
of mock catalogues (solid curves). Results are shown at z = 2.4, 
3 and 3.9 (from bottom to top). The curves at Z = 2.4 and 3 have 
been shifted vertically for clarity. 







i i 1 i i i i 1 i i i 
Az = 0.2 


i 1 i i i 


i 1 i i i i 1 i i 




1.5 




as in MOO 

a x = — 1.56 


if ^ 

.7/ " \ 






£ 1 




/ / 
/ / 
/ / 
i 


A'/V 






Oh 




ft 
h 


fi:f \\ 

■ .7 \ \ 
■"/ \ \ 


\ 




0.5 




i / 

1 / :\ 
II / -7 

/' / ■¥ 
Ii / ,y 
// / jj}/ 

^^^rT, i , , , 


7 v 
V 

, i , , , 


% 
\v 
\V 

V % 
v \\ 

\\ ' \s 







1 


-0.5 


0.5 
F 


1 1.5 


! 



Figure 8. The probability distribution of the transmitted flux 
for the redshift intervals adopted in MOO (dashed curves), for the 
redshift intervals considered in this work (solid curves), and for a 
fixed continuum index ct\ = —1.56 (dotted curves). The redshift 
intervals are centered at z = 2.4, 3 and 3.9. 



the strong increase in the flux over the range 3.5 < z < 4.5 
conspire to raise the average mean flux by 10 per cent. At 
z = 2.4, the height of the peak is decreased by about 10 
per cent. Obviously, the strength of the effect depends on 
the exact shape of the selection function (see Fig. 3). This 
result suggests that the PDF measured by MOO from a sam- 
ple of Keck quasars is also likely to be biased with respect 
to that measured in intervals of size Az — 0.2. We will dis- 
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Figure 9. A comparison between the observed and mock Lya flux PDFs of low-resolution QSOs at redshift z = 3.9, 3 and 2.4. The 
error bars are attached to the observed PDF only. The solid and dotted vertical bars indicate the observed and predicted mean flux F. 
In the left panel, the models have (F) = 0.50, 0.70 and 0.80. In the right panel, the mean flux parameter is (F) = 0.45, 0.65 and 0.75, 
respectively. 



cuss this point later in Section §6. In Fig. 8, the dotted 
curves show the PDF at z = 3 and 2.4 for a fixed value of 
the continuum index, a>\ = —1.56. Interestingly, accounting 
for variation in the continuum slope has a noticeable im- 
pact on the PDF, especially on the average mean flux. For 
a fixed index ot\ = —1.56, the mean flux at z = 3 and 2.4 
is F = 0.626 and 0.675, respectively ~3 and ~7 per cent 
lower than the values of 0.643 and 0.728 obtained with a 
spectrum-by-spectrum fitting. We have also changed the in- 
terval defining the Lya forest, and found that the measured 
PDF is robust to the wavelength range as long as intrinsic 
features to the quasar are excluded. 



5 COMPARISON BETWEEN OBSERVED AND 
MOCK SPECTRA 

In this Section, we compare the flux probability distribution 
of low resolution mock spectra with that inferred from the 
DR3 sample. 



5.1 The PDF of the transmitted flux 

We generate mock catalogues of low resolution spectra for 
the best-fitting values of the parameters obtained in Sec- 
tion §3. We adopt a grid similar to that used in §3. The 
comoving length of a single mock spectrum is typically 
J>1000 /i _1 Mpc. We account for instrumental resolution, 
noise and the presence of strong absorption systems accord- 
ing to the procedure outlined in §2.3. 

The observed and mock PDFs are compared in the left 
panel of Fig. 9. Error bars are attached to the observed 
PDF only. The mock spectra have a mean flux parameter 
(F) = 0.50, 0.70 and 0.80 at z = 3.9, 3 and 2.4. This cor- 
responds to an 'effective' (i.e. including strong absorption 



systems) mean flux F fa 0.47, 0.67 and 0.78 respectively. 
The solid and dotted vertical bars indicate F for the ob- 
served and simulated samples, respectively. F is on average 
larger by J>10 per cent in the mock samples. The mock PDF 
correctly accounts for the shape and redshift evolution mea- 
sured in the data. However, the agreement is poor given the 
small error bars. At z <J3 in particular, the peak in the flux 
probability distribution is significantly more pronounced in 
the mock PDF than in the observation. 



5.2 Sensitivity to the mean flux, noise level, and 
the presence of strong absorption systems 

Since the synthetic spectra have been constrained to repro- 
duce the observed PDF and PS measured in high resolu- 
tion data, the shortcomings of the lognormal model are un- 
likely responsible for the difference between the observed 
and mock PDFs. The disagreement must originate either 
in the transformation of the idealised mocks into realistic 
looking SDSS spectra, or in the measurement of the Lya 
flux probability distribution of the SDSS sample. We now 
discuss a number of systematics that may cause the differ- 
ence between data and simulation. 



5.2.1 Mean flux 

Fig. 9 examines the sensitivity of the SDSS PDF to the mean 
flux level. The right panel shows the mock PDF of the best- 
fitting models listed in Table 1, with (F) = 0.45, 0.65 and 
0.75 at z — 3.9, 3 and 2.4, respectively (This corresponds 
to F =0.42, 0.63 and 0.74). These values are marginally 
consistent with those inferred from high resolution measure- 
ments of the Lya forest. Note, however, that the z <J3 mod- 
els poorly account for the observed Lya flux PDF and PS. 
Changing (F) affects both the shape and the peak position 



11 



1.5 



0.5 



1 1 I 1 1 1 1 I 1 
with LLS+DLAs 

no LSS+DLAs 




_ 1 


1 1 1 1 1 1 1 1 

with noise 


I i i i 


1 1 
1 * 
1 1 


1 i _ 




- - no noise 


/ 


1 
1 
1 
1 








< 

/ 


1 1 
1 1 

\l 
\ '\ 




" J— 


/ ^ 

i — ii -t-M 1 i i i 


III, 


w 





-0.5 



0.5 
F 



1.5 



Figure 10. Top panel : Impact of strong absorption systems on 
the flux PDF. The solid and dashed curves show the PDF with 
and without strong absorption systems. Bottom panel : Sensitivity 
of the PDF to the noise level. The dashed curves show the PDF 
when the noise is set to zero. The mock PDF have {F) = 0.5, 0.7 
and 0.8 for z = 3.9, 3 and 2.4, respectively. 



Fpc&k of the PDF. The right panel of Fig. 9 demonstrates 
that a ~10 per cent decrease in (F) improves the agreement 
with the observed PDF, especially with the observed mean 
flux F. Notwithstanding this, at z <;3, the mock PDF still 
peaks at a higher value of the transmitted flux F than the 
observed PDF. The effect is strongest at z <J3. Consequently, 
lowering the mean flux level (F) in the mock spectra can at 
best partly alleviate the tension between the mock and ob- 
served PDFs. It should also be noted that, in the mock PDF, 
the mean flux value F is significantly lower than Fpc^ at 
z <3. This follows from the fact that the PDF is asymmetric 
around -Fpeak, decreasing sharply for F ^>F p(! ak- 



5.2.2 Metal lines and strong absorption systems 

The probability distribution of the flux may be affected by 
the presence of metal lines and strong absorption systems 
(e.g. Schaye et al. 2003; Viel et al. 2004b). The top panel of 
Fig. 10 investigates the sensitivity of the PDF to the pres- 
ence of strong absorption systems. The inclusion of strong 
absorption systems (cf. Section §2.3) decreases the mean 
transmitted flux F of the low resolution mock spectra by 
~ 3 — 6 per cent in the redshift range 2.5 <.z <C4. We con- 
firm the results of McDonald et al. (2005b) who find that 
the Lya forest is not very sensitive to the details of the 



strong absorption lines, except when the damping wings be- 
come important. Regarding the presence of metal lines, the 
typical metallicity of the low-density IGM remains largely 
unknown, although early statistical analysis based on pixel 
optical depth methods seemed to indicate that there is CIV 
and OVI associated with the low-column density Lya forest 
(Cowie & Songaila 1998; Ellison et al. 2000; Schaye et al. 
2000a). More recent studies appear to refute these claims, 
and suggest that the volume filling fraction of metals is 
small, ~ 5 per cent, in the redshift range 2 < z < 4 (Pieri & 
Haehnelt 2003; Aracil et al. 2004). Clearly, at redshift z ^3, 
absorption in the Lya forest region is strongly dominated 
by the Lya resonant transition, and the impact of metals 
on the flux PDF is likely to be negligible. At lower redshift 
however, metals contribute more significantly to the absorp- 
tion. From a sample of quasars at mean redshift z — 1.9, 
Tytler et al. (2004) have estimated that metal lines absorb 
~2.5 per cent of the flux in the Lya forest. However, this 
falls short of explaining the ;>10 per cent different found 
between the simulated and predicted PDFs. 



5.2.3 Noise level 

The bottom panel of Fig. 10 illustrates the sensitivity of the 
flux probability distribution to the amount of noise. The 
solid curve shows the mock PDF obtained with our default 
noise level a p given by the SDSS reduction pipeline (cf. §2.3). 
The dashed curve shows the PDF when the noise is set to 
zero. The large pipeline noise smoothes the idealized, noise- 
free PDF significantly. A few per cent change in a v notice- 
ably affects the shape of the PDF. In this respect, at z <3, 
increasing a v would lower the peak, and bring the PDF in 
better agreement with the observations. Note, however, that 
changing the noise level leaves the mean flux (F) unchanged. 

Fig. 3 shows that the distribution of signal-to-noise in 
our sample is broad. To assess the extent to which the shape 
of the measured PDF is affected by the noise level, we split 
the sample based on the mean signal-to-noise in the Lya 
forest. The upper panels of Fig. 11 show the probability dis- 
tribution of the Lya flux for spectra with S/N > 4 (left pan- 
els) and S/N < 4 (right panels) for our fiducial continuum 
and noise level. Results are shown solely for z — 3 and 2.4, 
as higher redshift spectra tend to have low S/N ratios (see 
Figure 3). The observed PDF is plotted with errorbars. Cos- 
mic variance errors have been computed separately for both 
subsamples, which include approximately the same number 
of data points (~ 6 x 10 4 pixels). The upper right panels 
show that the peak height of the observed flux probability 
distribution of low S/N spectra is larger at z — 3 than at 
z — 2.4, while high S/N spectra show the opposite trend. 
This is presumably due to the large noise which dominates 
the signal, and gives the PDF a nearly Gaussian shape. 

Since the SDSS spectral resolution varies in the range 
R ~ 1800 — 2100, we have also computed the PDF with an 
lower instrumental resolution of 150 kms -1 (FWHM). At 
z <J3, a decrease of <J 10 per cent in the instrumental reso- 
lution has a relatively small impact on the flux probability 
distribution. 
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Figure 11. A comparison between the measured (curves with errorbars) and mock probability distributions of the Lya flux at z = 3 and 
2.4. The left and right panels are for spectra with S/N ratios greater than and less than 4, respectively. The upper panels show results 
for our default noise and continuum. In the middle panels, the fiducial continuum has been decreased by 10 per cent. In the bottom 
panels, both the continuum and the noise have been rescaled by a factor 0.9 and 1.2, respectively. 



5.3 Changing the continuum and noise levels 

Decreasing the mean flux, accounting for metal lines or in- 
creasing the noise level in the mock spectra can only partly 
account for the difference between the observed and simu- 
lated PDF. 

In the data, the main sources of systematic errors are 
inaccuracies in the continuum fitting of the spectra. Given 
the large degeneracy between the amount of absorption and 
the continuum level in the Lya region, it is unclear whether 
the single power law approximation can be extended short- 
ward of the Lya emission line. Low redshift spectroscopic 
measurements show indeed that there is a break around 
1000 — 1300A in the slope of the mean quasar continuum. 
They indicate that the continuum turns over in that rest 
frame region, from a v > — 1 longward of the break to 
a„ < — 1 shortward (e.g. Zheng et al. 1997; Telfer et al. 
2002). The exact location of the break is, however, diffi- 
cult to determine due to the presence of emission lines. This 
turnover is neither accounted for in the continuum extrapo- 
lation method of P93 and B03, nor in our procedure. As 
noticed by by Kim et al. (2001) and Meiksin, Bryan & 
Machacek (2001), this may lead to an underestimation of 
(F) that could be as large as <^7 per cent (Seljak, Mc- 
Donald & Makarov 2003). Consequently, we will now relax 
the assumption of a single power law. Since there are large 
uncertainties in the behaviour of the continuum blueward 



of 1216A, we have not looked for a parametric form of the 
turnover. Instead, we have simply assumed that the true 
continuum is a rescaled version of our fiducial continuum 
C, Ctruc — /3 C C, where (3 C is a mean correction factor which 
we will attempt to constrain. We ignore any possible depen- 
dence on redshift and quasar luminosity. 

The middle panels of Fig. 11 show the observed prob- 
ability distribution when the continuum in the Lya region 
is rescaled by 90 per cent (/3 C = 0.9). It is compared to the 
mock PDF obtained with the fiducial noise level. Note that 
the errors in the observed transmitted flux depend on the 
continuum level, as F = J bs/-fcont- The 10 per cent decrease 
in the continuum translates into a comparable increase in 
the observed mean flux, which is now F — 0.69 and 0.80 at 
z = 3 and 2.4, respectively. Notwithstanding this, the peak 
in the simulated PDF is still more pronounced than in the 
observed PDF. A further decrease of the continuum does not 
improve the agreement. In fact, unless the actual mean flux 
is significantly lower than that inferred from high resolution 
data, a substantial increase in the noise level is needed to 
reproduce the smooth shape of the observed PDF. 

The bottom panel of Fig. 11 demonstrates that the 
agreement with the data is substantially improved if the 
pipeline noise is increased by ~20 per cent. The agreement 
is however better at z — 3 than at z = 2.4, where the mock 
PDF still appears to be shifted to larger values of F when 
compared to the observed PDF. This may be due to too 
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large a mean flux level (F). It may also reflect the poor 
performance of the lognormal model at redshift z < 3 (cf. 
Table 1). 

The correction factor /3 C that quantifies the deviation 
from a power law is expected to vary from spectrum to spec- 
trum. It also probably depends on the rest frame wavelength 
Arcst • Using a constant value of p c smoothes the flux proba- 
bility distribution as compared to the "true" PDF, thereby 
mimicking the effect of a larger noise. We have found that 
the introduction of a 10 per cent scatter (Gaussian deviate) 
in the continuum level of z — 3 mock spectra smoothes the 
flux PDF on a level comparable to a 20 per cent increase 
in the noise. At z — 2.4, this corresponds to an even larger 
increase in the noise, presumably because the average signal- 
to-noise is lower. Therefore, an increase in the mean noise 
level can account for both a larger noise per pixel and vari- 
ations in the continuum. We will come back to this point 
in §6. 



5.4 The best-fitting models 

The agreement between the mock and observed PDFs of low 
resolution spectra can be substantially improved with a si- 
multaneous decrease in the continuum level, and an increase 
in the noise per pixel. We will now attempt to quantify the 
correction needed in the continuum level and noise estimate. 
In spite of the large uncertainties in the actual noise level, 
we assume that the true noise otruo differs from the pipeline 
noise by a constant factor, at ruc = P n o p . Similarly, we take 
the true continuum in the Lya region to be a scaled ver- 
sion of the fiducial continuum, Ct ruc = Pc C. We let p n and 
f3 c vary in the range 1 < (3 n < 1.8 and 0.8 < f3 c < I. We 
take 0.65 < (F) < 0.75 and 0.75 < (F) < 0.85 at z = 3 
and 2.4, respectively. For each value of /3 C , we compute the 
PDF of the Lya transmitted flux from the SDSS quasars 
with S/N > 4. We store the distribution of noise per pixel 
values, 5F, as a function of /3 C since the average 5F increases 
with decreasing value of (5 C . To compute the mock PDF, we 
let the filtering wavenumber kp and the adiabatic index 7 
assume the best-fitting values obtained in §3 as a function 
of (F), but for a fixed value of the temperature, T4 = 1.5. 
We compute the flux probability distribution for each choice 
of (/3 C , P„, (F)). The goodness of fit of the models is obtained 
by minimizing a \ 2 statistic as in §3. The shaded region in 
Fig. 12 indicates the SDSS data points we use in the calcula- 
tion of \ 2 - We discard the data points falling in the <;10 per 
cent lower and <T0 per cent upper tails of the flux distribu- 
tion. We also include in the \ 2 minimisation the best-fitting 
values of \ 2 as a function of (F) obtained from the measure- 
ments of MOO (cf. §3). The best-fitting values of the param- 
eters are (J3 C , Pn, (F))=(0.87,1.51,0.72) and (0.84,1.55,0.84), 
and correspond to a reduced chi-squared \ 2 jv = 1.03 and 
1.54 at z — 3 and 2.4 respectively. The best-fitting models 
are plotted in Fig. 12 as short and long dashed curves. Re- 
stricting the x 2 minimisation to the SDSS data set solely 
does not affect noticeably the central values of p c , p n and 
(F). 




-0.5 0.5 1 1.5 



F 

Figure 12. The best-fitting Lya flux PDF at redshift z = 3 
and 2.4. The short and long dashed curves show the best-fitting 
mock PDF at z = 3 and 2.4 respectively. The continuum level, 
noise estimate and mean flux parameter (F) have been varied 
to obtain the best-fitting models. The symbols with errorbars 
show the observed flux PDF of spectra with a signal-to-noise ratio 
greater than 4. The shaded region indicates the SDSS data points 
used to compute the value of \ 2 ■ We also include in the chi- 
squared the measurements of MOO as described in §3. 

6 DISCUSSION 

6.1 Systematics in the measurements 

Section §5.4 argues that the continuum level needs to be 
lowered by 10-15 per cent, and the pipeline noise increased 
by ~ 50 per cent so that the mock PDF matches the data. 
The noise correction is significantly larger than that inferred 
by McDonald et al. (2006) and Burgess (2004) by differenc- 
ing multiple exposures of the same quasar. They have found 
that the SDSS pipeline underestimates the true errors by 
5-10 per cent on average. Although we do not have access to 
additional exposures, we can take advantage of the relative 
smoothness of the spectra red ward of the Lya emission line, 
and estimate the noise in, e.g., the rest-frame wavelength in- 
terval 1450- 1470 A as the rms variance <tj of the flux around 
the continuum. Then, <t/ can be compared to the average 
pipeline noise variance, a p , in the same region. The value 
of a p is obtained by squaring the individual pixel noise esti- 
mates, computing the average of these squared values, and 
taking the square root. The distribution of ratios of/cr P is 
shown in Fig. 13 for the individual spectra. The solid his- 
togram indicates the mean in bins of Az = 0.2. The median, 
which is less sensitive to outliers, is also shown as the dashed 
histogram. The mean ratio does not evolve significantly with 
redshift, although Fig. 13 suggests that it might be bigger at 
higher redshift. In the range z ~ 2.5 — 4, 07 is on average 10- 
15 per cent larger than a p . This is comparable to the excess 
noise contribution inferred by McDonald et al. (2006) and 
Burgess (2004). It is unclear to which extent of/o p reflects 
the excess noise contribution in the Lya forest. However, the 
fact that McDonald et al. (2006) find the same excess noise 
power in the region 1268-1380A as they do in the Lya forest 
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Figure 13. The distribution of ratios <jj/o- v in the rest-frame 
wavelength interval 1450 — 1470A, where a f is the rms flux vari- 
ance, and a v is the mean SDSS pipeline noise estimate (see text). 
The solid and dashed histograms indicate the mean and median 
in bins of Az = 0.2. 



suggests that the fraction of extra noise does not depend 
strongly on A rost . Consequently, the 50 per cent increase in 
the noise level most probably arises from residual variations 
in the continua of quasars that are not accounted for by our 
spectrum-to-spectrum continuum fitting. In §5.3, we have 
found that the introduction of a 10 per cent scatter in the 
continuum level of z <;3 mock spectra smoothes the flux 
PDF on a level comparable to a 20-30 per cent increase in 
the noise. Hence, we believe that a reasonable ~20 per cent 
scatter in the continuum level can account for the smooth 
shape of the SDSS PDF if the noise excess correction is no 
larger than <;10 per cent. To proceed further, one could 
add another free parameter describing the residual scatter 
in the continuum and perform again the \ 2 minimisation 
of §5.4. However, given our approximate characterisation of 
the continuum and noise level, we have not examined this 
issue here. 

Systematics errors arising from continuum fitting in the 
measurements of MOO may bias our best-fitting values of 
the parameters, and thereby affect the PDF of mock SDSS 
spectra. However, a comparison with hydrodynamical sim- 
ulations of the Lya forest indicates that the MOO measure- 
ments are robust to continuum errors for transmitted flux 
values F jj0.8. The large redshift range covered by the MOO 
bins may also affect our results. The latter have been ob- 
tained using small redshift intervals, Az — 0.2, to avoid 
dealing with the (poorly constrained) redshift dependence of 
the model parameters. Fig. 8 shows that our measurement of 
the PDF of SDSS quasars is sensitive to the redshift extent 
of the bins. The significance of this effect in measurements of 
the PDF from high-resolution quasars is unknown, though 
it could be easily estimated from the few tens of spectra 
available so far. 



6.2 Systematics in the model 

The lognormal model of the IGM allows us to create very 
long, realistic mock spectra of the Lya forest. However, the 
model has several shortcomings. It neglects any possible 
scatter in the temperature-density relation of the low den- 
sity IGM as a results of shocks and inhomogeneous helium 
reionization. It also assumes an uniform ultraviolet (UV) 
background, and a filtering length that is independent of the 
local gas density and temperature. Furthermore, it ignores 
any galactic feedback. 

Hydrodynamical simulations predict that shock heating 
should drive a significant fraction of the baryons into the 
warm-hot phase of the intergalactic medium (WHIM) at low 
redshift. At the present epoch, this fraction might be as large 
as 40 per cent (e.g. Cen & Ostriker 1999; Dave et al. 2001; 
see also Nath & Silk 2001). At redshift z ~ 3 however, these 
simulations indicate that this fraction falls below 10 per cent, 
and that most of the WHIM baryons resides in overdensities 
5b J>10 (Dave et al. 2001). Hence, shock heating should have 
a rather weak impact on the low density IGM at z m 3. 

Inhomogeneities in the UV background may also affect 
the power spectrum and the PDF of the Lya flux (Zuo 1992; 
Fardal & Shull 1993; Croft et al. 2002a). At z < 4 however, 
fluctuations due to the finite number of sources are only 
at the few percent level because of the small attenuation 
length (Croft 2004; Meiksin & White 2004; McDonald et al. 
2005b). Recent measurements of the Lya absorption near 
Lyman-break galaxies (Adelberger et al. 2003) are taken as 
evidence for the existence of dilute and highly ionised gas 
bubbles caused by sup ernovae- driven winds. Notwithstand- 
ing this, simulations indicate that their small filling factor 
results in a moderate impact on statistics of the Lya for- 
est such as the power spectrum or the PDF of the trans- 
mitted flux (e.g. Croft et al. 2002a; Weinberg et al. 2003; 
Desjacques et al. 2004; McDonald et al. 2005b; Desjacques, 
Haehnelt & Nusser 2006). They may, however, have a large 
impact on the number and properties of absorption lines 
with N m ^10 16 cm" 2 (Theuns, Mo & Schaye 2000; Theuns 
et al. 2002a). 

The use of a polytropic equation of state and a constant 
filtering length to mimic the temperature and pressure of 
the gas has been shown to produce results comparable to 
detailed hydrodynamical simulations (Petitjean et al. 1995; 
Croft et al. 1998; Gnedin & Hui 1998; Meiksin & White 
2001). Yet, patchy helium reionization can cause significant 
scatter in the temperature-density relation (e.g. Gleser et al. 
2005). Furthermore, in light of the results of Viel, Haehnelt 
& Springel (2006), we expect significant differences between 
the Lya statistics predicted by full hydrodynamical simu- 
lations and from the lognormal model. Consequently, the 
constraints on, e.g., the temperature and adiabatic index 
which can be inferred from the high resolution data should 
be taken with caution. In particular, the linear amplitude ctl 
is an effective normalisation which cannot be directly related 
to the actual rms variance of the gas distribution. However, 
once the parameters of the lognormal model are constrained 
so as to reproduce the observed Lya flux power spectrum 
and probability distribution measured in MOO, the disagree- 
ment seen in Fig. 9 must arise either from systematics errors 
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in the measurement of the PDF or in the conversion of the 
idealised mocks into realistic looking SDSS spectra. 



7 CONCLUSION 

We have presented measurements of the probability distri- 
bution of the Lya transmitted flux in the redshift range 
2.5 <,z £4, from 3492 quasars included in the SDSS DR3 
data release. We have compared the measured PDF to pre- 
dictions derived from mock spectra, whose statistical proper- 
ties have been constrained to match those of high resolution 
data. To proceed, we have generated very long, lognormal 
spectra of the Lya forest that have been degraded to include 
the instrumental noise and resolution of real data. The mock 
spectra provide a good match to the Lya flux PS and PDF 
measured in McDonald et al. (2000) in the region z ;>3. 

We have assumed that the quasar continuum follows 
the parametric form given in B03. However, unlike B03, we 
have allowed for the slope of the power-law continuum to 
vary from object to object. We measure an average contin- 
uum slope of q„ — 0.59 ± 0.36 in the range 2.4 <z <3.6, in 
good agreement with the mean slope reported by Vanden 
Berk et al. (2001), a v = —0.44, and with values found in 
optically selected samples (e.g. Francis et al. 1991; Natali et 
al. 1998). Accounting for variation in continuum indices has 
a significant impact on the mean flux F. We find that F at 
redshift z = 3 and 2.4 is respectively 3 and 7 per cent higher 
than the mean flux measured for a fixed index a v — —0.44. 

Although the model parameters have been adjusted to 
reproduce the observed Lya flux PS and PDF of high res- 
olution data, the mock SDSS spectra predict a probability 
distribution that is significantly different from the PDF we 
measure from the SDSS quasar sample. Allowing for a break 
in the continuum and, more importantly, for residual scatter 
in the continuum level improve the agreement substantially. 
We find that the introduction of a 10 per cent scatter in 
the continuum level of z <;3 mock spectra smoothes the flux 
PDF on a level comparable to a 20-30 per cent increase in 
the noise. A combined fit of the SDSS and Keck data indi- 
cates that a decrease of 10-15 per cent in the amplitude of 
the power law continuum together with a 20 per cent scat- 
ter can account for the data, provided that the noise excess 
correction is no larger than <;10 per cent. 

Measuring the probability distribution of the transmit- 
ted flux requires a spectrum-by-spectrum treatment of the 
quasar continuum, as the latter varies significantly from 
quasar to quasar. Furthermore, as we have seen, it is crucial 
to account for the slow variation of the continuum slope in 
the Lya region in order to obtain a sensible estimate of the 
flux probability distribution. Therefore, it would be desir- 
able to obtain high resolution exposures of a subsample of 
SDSS quasars so as to quantify the errors introduced by the 
continuum fitting procedure described in this paper. Alter- 
natively, Lidz et al. (2005) have suggested working with the 
estimator defined by 5 f (u>) = (i£ bB (u;) - ^iLM) A^bsM, 
where I^ hB and is the observed flux smoothed with a 
Gaussian filter on scale r <JC R and R ~ 500 kms^ 1 , re- 
spectively. They have demonstrated that the PDF of Sf is 



insensitive to the shape and normalisation of the contin- 
uum. However, this estimator has an important drawback 
as it also smoothes out any feature in the redshift evolu- 
tion of the mean optical depth, T e s(z), whose scale is larger 
than R. Their choice of R corresponds to a redshift interval 
Az <J0.01 at z = 3. It is therefore unclear whether the sud- 
den change measured by B03 at z ~ 3.2 can be detected in 
the statistics of flux estimators other than the transmitted 
flux F. 
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